Integration of EGA secure data access into Galaxy
نویسندگان
چکیده
High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study. The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.
منابع مشابه
EGA: Quick tour
Participants in medical or genetic research projects have typically provided consent for their data to be used in research but not for open public distribution. These data require a secure archiving, processing and disseminating service that respects the original informed consent agreements. The EGA was created as a service to make sure that all such data can be made available for the researche...
متن کاملSystematically linking tranSMART, Galaxy and EGA for reusing human translational research data
The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reaso...
متن کاملAuthorization models for secure information sharing: a survey and research agenda
This article presents a survey of authorization models and considers their 'fitness-for-purpose' in facilitating information sharing. Network-supported information sharing is an important technical capability that underpins collaboration in support of dynamic and unpredictable activities such as emergency response, national security, infrastructure protection, supply chain integration and emerg...
متن کاملVisualization Framework for the Integration and Exploration of Heterogeneous Geospatial Data
This paper presents an interactive visualization framework for heterogeneous geospatial data developed in context of an interdisciplinary research project that aims at the risk analysis of sea-dumped chemical weapons in the Baltic Sea. In the focus of the analysis are geophysical, hydrographical, geochemical, and biological data acquired on research cruises as well as data produced by toxic com...
متن کاملA guide and best practices for R / Bioconductor tool integration in
Galaxy provides a web-based platform for interactive, large-scale data analyses, which integrates bioinformatics tools written in a variety of languages. A substantial number of these tools are written in the R programming language, which enables powerful analysis and visualization of complex data. The Bioconductor Project provides access to these open source R tools and currently contains over...
متن کامل